AITopics | speech intelligibility

Collaborating Authors

speech intelligibility

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Condition-Invariant fMRI Decoding of Speech Intelligibility with Deep State Space Model

Sung, Ching-Chih, Suzuki, Shuntaro, Chien, Francis Pingfan, Sugiura, Komei, Tsao, Yu

arXiv.org Artificial IntelligenceNov-5-2025

Clarifying the neural basis of speech intelligibility is critical for computational neuroscience and digital speech processing. Recent neuroimaging studies have shown that intelligibility modulates cortical activity beyond simple acoustics, primarily in the superior temporal and inferior frontal gyri. However, previous studies have been largely confined to clean speech, leaving it unclear whether the brain employs condition-invariant neural codes across diverse listening environments. To address this gap, we propose a novel architecture built upon a deep state space model for decoding intelligibility from fMRI signals, specifically tailored to their high-dimensional temporal structure. We present the first attempt to decode intelligibility across acoustically distinct conditions, showing our method significantly outperforms classical approaches. Furthermore, region-wise analysis highlights contributions from auditory, frontal, and parietal regions, and cross-condition transfer indicates the presence of condition-invariant neural codes, thereby advancing understanding of abstract linguistic representations in the brain.

artificial intelligence, intelligibility, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.01868

Country: Asia (0.15)

Genre: Research Report > New Finding (0.71)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)
Information Technology > Artificial Intelligence > Cognitive Science (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Room acoustics affect communicative success in hybrid meeting spaces: a pilot study

Einig, Robert, Janscha, Stefan, Schuster, Jonas, Koch, Julian, Hagmueller, Martin, Schuppler, Barbara

arXiv.org Artificial IntelligenceSep-16-2025

Since the COVID-19 pandemic in 2020, universities and companies have increasingly integrated hybrid features into their meeting spaces, or even created dedicated rooms for this purpose. While the importance of a fast and stable internet connection is often prioritized, the acoustic design of seminar rooms is frequently overlooked. Poor acoustics, particularly excessive reverberation, can lead to issues such as misunderstandings, reduced speech intelligibility or cognitive and vocal fatigue. This pilot study investigates whether room acoustic interventions in a seminar room at Graz University of Technology support better communication in hybrid meetings. For this purpose, we recorded two groups of persons twice, once before and once after improving the acoustics of the room. Our findings -- despite not reaching statistical significance due to the small sample size - indicate clearly that our spatial interventions improve communicative success in hybrid meetings. To make the paper accessible also for readers from the speech communication community, we explain room acoustics background, relevant for the interpretation of our results.

artificial intelligence, experiment, participant, (15 more...)

arXiv.org Artificial Intelligence

2509.11709

Country: Europe > Austria > Styria > Graz (0.25)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Collaboration (0.94)

Add feedback

RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching

Park, Hyun Joon, Liu, Jeongmin, Kim, Jin Sob, Yang, Jeong Yeol, Han, Sung Won, Song, Eunwoo

arXiv.org Artificial IntelligenceJun-23-2025

We introduce RapFlow-TTS, a rapid and high-fidelity TTS acoustic model that leverages velocity consistency constraints in flow matching (FM) training. Although ordinary differential equation (ODE)-based TTS generation achieves natural-quality speech, it typically requires a large number of generation steps, resulting in a trade-off between quality and inference speed. To address this challenge, RapFlow-TTS enforces consistency in the velocity field along the FM-straightened ODE trajectory, enabling consistent synthetic quality with fewer generation steps. Additionally, we introduce techniques such as time interval scheduling and adversarial learning to further enhance the quality of the few-step synthesis. Experimental results show that RapFlow-TTS achieves high-fidelity speech synthesis with a 5- and 10-fold reduction in synthesis steps than the conventional FM- and score-based approaches, respectively.

artificial intelligence, machine learning, rapflow-tts, (15 more...)

arXiv.org Artificial Intelligence

2506.16741

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (0.84)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Enhancing Cochlear Implant Signal Coding with Scaled Dot-Product Attention

Essaid, Billel, Kheddar, Hamza, Batel, Noureddine

arXiv.org Artificial IntelligenceApr-29-2025

--Cochlear implants (CIs) play a vital role in restoring hearing for individuals with severe to profound sensorineural hearing loss by directly stimulating the auditory nerve with electrical signals. While traditional coding strategies, such as the advanced combination encoder (ACE), have proven effective, they are constrained by their adaptability and precision. This paper investigates the use of deep learning (DL) techniques to generate electrodograms for CIs, presenting our model as an advanced alternative. We compared the performance of our model with the ACE strategy by evaluating the intelligibility of reconstructed audio signals using the short-time objective intelligibility (STOI) metric. The results indicate that our model achieves a STOI score of 0.6031, closely approximating the 0.6126 score of the ACE strategy, and offers potential advantages in flexibility and adaptability. This study underscores the benefits of incorporating artificial intelligent (AI) into CI technology, such as enhanced personalization and efficiency.

ace strategy, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICTIS62692.2024.10894163

2504.19046

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Consumer Health (0.93)
Health & Medicine > Therapeutic Area > Otolaryngology (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Voice Biomarker Analysis and Automated Severity Classification of Dysarthric Speech in a Multilingual Context

Yeo, Eunjung

arXiv.org Artificial IntelligenceNov-30-2024

Dysarthria, a motor speech disorder, severely impacts voice quality, pronunciation, and prosody, leading to diminished speech intelligibility and reduced quality of life. Accurate assessment is crucial for effective treatment, but traditional perceptual assessments are limited by their subjectivity and resource intensity. To mitigate the limitations, automatic dysarthric speech assessment methods have been proposed to support clinicians on their decision-making. While these methods have shown promising results, most research has focused on monolingual environments. However, multilingual approaches are necessary to address the global burden of dysarthria and ensure equitable access to accurate diagnosis. This thesis proposes a novel multilingual dysarthria severity classification method, by analyzing three languages: English, Korean, and Tamil.

classification, data mining, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2412.12111

Country:

North America > United States (0.14)
North America > Canada > Quebec > Montreal (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.82)
Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(5 more...)

Add feedback

TSELM: Target Speaker Extraction using Discrete Tokens and Language Models

Tang, Beilong, Zeng, Bang, Li, Ming

arXiv.org Artificial IntelligenceSep-16-2024

We propose TSELM, a novel target speaker extraction network that leverages discrete tokens and language models. TSELM utilizes multiple discretized layers from WavLM as input tokens and incorporates cross-attention mechanisms to integrate target speaker information. Language models are employed to capture the sequence dependencies, while a scalable HiFi-GAN is used to reconstruct the audio from the tokens. By applying a cross-entropy loss, TSELM models the probability distribution of output tokens, thus converting the complex regression problem of audio generation into a classification task. Experimental results show that TSELM achieves excellent results in speech quality and comparable results in speech intelligibility.

arxiv preprint arxiv, mixed speech, speech, (13 more...)

arXiv.org Artificial Intelligence

2409.07841

Country: Asia > China (0.05)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios

Gong, Cheng, Cooper, Erica, Wang, Xin, Qiang, Chunyu, Geng, Mengzhe, Wells, Dan, Wang, Longbiao, Dang, Jianwu, Tessier, Marc, Pine, Aidan, Richmond, Korin, Yamagishi, Junichi

arXiv.org Artificial IntelligenceJun-13-2024

Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system proposed in our previous work. We conducted experiments on 12 languages using limited data with various fine-tuning configurations. We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance. Additionally, we find that the fine-tuning dataset size and number of speakers influence adaptability. Surprisingly, we also observed that using paired data for fine-tuning is not always optimal compared to audio-only data. Beyond speech intelligibility, our analysis covers speaker similarity, language identification, and predicted MOS.

fine-tuning, representation, similarity, (16 more...)

arXiv.org Artificial Intelligence

2406.08911

Country:

Asia > China > Tianjin Province > Tianjin (0.05)
North America > Canada (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

No More Mumbles: Enhancing Robot Intelligibility through Speech Adaptation

Ren, Qiaoqiao, Hou, Yuanbo, Botteldooren, Dick, Belpaeme, Tony

arXiv.org Artificial IntelligenceMay-15-2024

Spoken language interaction is at the heart of interpersonal communication, and people flexibly adapt their speech to different individuals and environments. It is surprising that robots, and by extension other digital devices, are not equipped to adapt their speech and instead rely on fixed speech parameters, which often hinder comprehension by the user. We conducted a speech comprehension study involving 39 participants who were exposed to different environmental and contextual conditions. During the experiment, the robot articulated words using different vocal parameters, and the participants were tasked with both recognising the spoken words and rating their subjective impression of the robot's speech. The experiment's primary outcome shows that spaces with good acoustic quality positively correlate with intelligibility and user experience. However, increasing the distance between the user and the robot exacerbated the user experience, while distracting background sounds significantly reduced speech recognition accuracy and user satisfaction. We next built an adaptive voice for the robot. For this, the robot needs to know how difficult it is for a user to understand spoken language in a particular setting. We present a prediction model that rates how annoying the ambient acoustic environment is and, consequentially, how hard it is to understand someone in this setting. Then, we develop a convolutional neural network model to adapt the robot's speech parameters to different users and spaces, while taking into account the influence of ambient acoustics on intelligibility. Finally, we present an evaluation with 27 users, demonstrating superior intelligibility and user experience with adaptive voice parameters compared to fixed voice.

intelligibility, robot, user experience, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LRA.2024.3401117

2405.09708

Country: Europe > Belgium > Flanders > East Flanders > Ghent (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Crowdsourced Multilingual Speech Intelligibility Testing

Lechler, Laura, Wojcicki, Kamil

arXiv.org Artificial IntelligenceMar-21-2024

With the advent of generative audio features, there is an increasing need for rapid evaluation of their impact on speech intelligibility. Beyond the existing laboratory measures, which are expensive and do not scale well, there has been comparatively little work on crowdsourced assessment of intelligibility. Standards and recommendations are yet to be defined, and publicly available multilingual test materials are lacking. In response to this challenge, we propose an approach for a crowdsourced intelligibility assessment. We detail the test design, the collection and public release of the multilingual speech data, and the results of our early experiments.

assessment, intelligibility, participant, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICASSP48485.2024.10447869

2403.14817

Country:

Europe > Greece (0.04)
North America > United States > Rhode Island (0.04)
North America > United States > Massachusetts > Middlesex County > Sudbury (0.04)
(7 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (1.00)

Add feedback

Cocktail Party Processing via Structured Prediction

Neural Information Processing SystemsMar-14-2024, 10:47:29 GMT

While human listeners excel at selectively attending to a conversation in a cocktail party, machine performance is still far inferior by comparison. We show that the cocktail party problem, or the speech separation problem, can be effectively approached via structured prediction. To account for temporal dynamics in speech, we employ conditional random fields(CRFs) to classify speech dominance within each time-frequency unit for a sound mixture. To capture complex, nonlinear relationship between input and output, both state and transition feature functions in CRFs are learned by deep neural networks. The formulation of the problem as classification allows us to directly optimize a measure that is well correlated with human speech intelligibility. The proposed system substantially outperforms existing ones in a variety of noises.

classification, classifier, speech intelligibility, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > Massachusetts > Plymouth County > Norwell (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback